On Evaluating and Comparing Conversational Agents

نویسندگان

Anu Venkatesh

Chandra Khatri

Ashwin Ram

Fenfei Guo

Raefer Gabriel

Ashish Nagar

Rohit Prasad

Ming Cheng

Behnam Hedayatnia

Angeliki Metallinou

Rahul Goel

Shaohua Yang

Anirudh Raju

چکیده

Conversational agents are exploding in popularity. However, much work remains in the area of non goal-oriented conversations, despite significant growth in research interest over recent years. To advance the state of the art in conversational AI, Amazon launched the Alexa Prize, a 2.5-million dollar university competition where sixteen selected university teams built conversational agents to deliver the best social conversational experience. Alexa Prize provided the academic community with the unique opportunity to perform research with a live system used by millions of users. The subjectivity associated with evaluating conversations is key element underlying the challenge of building non-goal oriented dialogue systems. In this paper, we propose a comprehensive evaluation strategy with multiple metrics designed to reduce subjectivity by selecting metrics which correlate well with human judgement. The proposed metrics provide granular analysis of the conversational agents, which is not captured in human ratings. We show that these metrics can be used as a reasonable proxy for human judgment. We provide a mechanism to unify the metrics for selecting the top performing agents, which has also been applied throughout the Alexa Prize competition. To our knowledge, to date it is the largest setting for evaluating agents with millions of conversations and hundreds of thousands of ratings from users. We believe that this work is a step towards an automatic evaluation process for conversational AIs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Black-box Approach for Response Quality Evaluation of Conversational Agent Systems

The evaluation of conversational agents or chatterbots question answering systems is a major research area that needs much attention. Before the rise of domain-oriented conversational agents based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when chatterbots began to become more doma...

متن کامل

Evaluating Embodied Conversational Agents in Collaborative Virtual Environments

There are currently no evaluation methods specific to ECAs in CVEs and traditional evaluation methods are limited in their applicability and consequently unlikely to address the full range of aspects now inherent in such systems. We argue that a combination of controlled experimentation, quasi-experiments, review-based evaluation and heuristic expert reviews is needed. To operationalise these t...

متن کامل

Scripting and Evaluating Affective Interactions with Embodied Conversational Agents

This paper describes the results obtained and ongoing agenda of a research project on embodied conversational agents, carried out at the University of Tokyo. The main focus points of the project are the development of scripting languages for controlling life-like agents and the modeling of affective interactions between agents and human users. Furthermore, the project aims at evaluating the imp...

متن کامل

Social Dialogue with Embodied Conversational Agents

The functions of social dialogue between people in the context of performing a task is discussed, as well as approaches to modelling such dialogue in embodied conversational agents. A study of an agent’s use of social dialogue is presented, comparing embodied interactions with similar interactions conducted over the phone, assessing the impact these media have on a wide range of behavioural, ta...

متن کامل

Reflections on Jennifer Saul's View of Successful Communication and Conversational Implicature

Saul (2002) criticizes a view on the relationship between speaker meaning and conversational implicatures according to which speaker meaning is exhaustively comprised of what is said and what is implicated. In the course of making her points, she develops a couple of new notions which she calls “utterer-implicature” and “audience-implicature”. She then makes certain claims about the relationshi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1801.03625 شماره

صفحات -

تاریخ انتشار 2018

On Evaluating and Comparing Conversational Agents

نویسندگان

چکیده

منابع مشابه

A Black-box Approach for Response Quality Evaluation of Conversational Agent Systems

Evaluating Embodied Conversational Agents in Collaborative Virtual Environments

Scripting and Evaluating Affective Interactions with Embodied Conversational Agents

Social Dialogue with Embodied Conversational Agents

Reflections on Jennifer Saul's View of Successful Communication and Conversational Implicature

عنوان ژورنال:

اشتراک گذاری